Practice using CSV files

Table of Contents

This is a “pair” assignment, which means that if you are working on a team with someone else, you and your partner should do your best to engage in the pair programming model. At any point in time, one of you is “driving,” i.e. actually using the keyboard and mouse. The other is actively engaged following along, preventing bugs, and providing ideas.

You should make sure that over the course of an assignment that you spend roughly the same amount of time each “driving.” I will also ask you to turn in a form rating the work that your partner does. My recommendation is to take turns approximately every 15 minutes or so. Set a timer to help you remember.

1. This is an optional assignment.

I was going to assign it, but I decided instead to give people a little more time to work on the data analysis final assignment instead. So there’s nothing to turn in. However, this would be really good practice for the final assignments, and so we will take some time in class to work on this.

2. Get started

Create a folder in which to store your work for this assignment.

  • If you are working on your own computer, it’s up to you where to put the folder. Your desktop is likely as good a place as any. Make a folder titled csvpractice.
  • If you are working in the labs in Olin, make sure to first mount the COURSES folder, so that you won’t lose your code when you log out. Once you’ve done so, open up Finder, then navigate to your personal student work folder. You can then make a csvpractice folder within there.
  • Once you’ve done so, you should then open up your new folder in VS Code. To do so, start up VS Code, then drag your folder onto the VS Code window. This should open up the folder within VS Code. If you are asked, click that you trust the authors.

3. The assignment

In this assignment, you’ll do a warmup data analysis activity to help prepare you for the bigger version that you’ll do on the next assignment. I have supplied a dataset of baby names for babies born in the US in 2021, obtained from the US Social Security Administation. You should answer the following questions:

  • How many names were given of each length? I.e., how many names were given of length 2, names of length 3, etc.? Include all genders together (don’t separate your counts by gender).

In order to do this, you should make use of csv.DictReader in Python to read in the data, and you should use a dictionary to keep track of the information that you’re counting.

Make sure to notice that this data has no header row, so you need to specify field names.

4. Exemplary

If you complete the above successfully (and have good style in your code), you will receive nearly all the points for the assignment. If you get this far, you should feel proud of your achievements! If you want to push yourself harder and go for the exemplary grade, do the following extra challenge.

  • What is the most common first letter for female baby names, and what is the most common first letter for male baby names?

5. Grading

You will receive an M for this assignment if…

  • your program displays the correct output
  • your program calculates the results by using the data in the file, and not based on some other means for getting the answers
  • your program uses csv.DictReader to read the data
  • your program uses dictionaries to keep track of the values that you are counting

You will receive a grade of E for this assignment if you satisfy the above M requirements, and …

  • every one of your variable names is meaningful in some way. (Names such as thing, number, etc. are not meaningful.)
  • You have at least one other comment near what you think is the trickiest part of your code, describing how it works
  • your code demonstrates a clear sequence of actions to achieve the goal at hand, and each piece is essential. Your code does not have notably more cases or conditions than it needs to.
  • you have successfully achieved the criteria described in the exemplary section above

There are no automated tests for this exercise, and I won’t confirm or deny if you have the right counts if asked before you submit. When doing data analysis for real, as you’ll be doing on your final assignment, the counts are unknown. Think about how you might verify the results for yourselves, or what alternative approaches you might use to guess if you’ve got approximately the right values.

Author: Dave Musicant